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Abstract 

The skew of a binary string is the difference between the number of 
zeroes and the number of ones, while the length of the string is the sum of 
these two numbers. We consider certain suffixes of the lexicographically- 
least de Bruijn sequence at natural breakpoints of the binary string. We 
show that the skew and length of these suffixes are enumerated by se- 
quences generalizing the Fibonacci and Lucas numbers, respectively. 

1 Introduction 

Let w = aia,2 ■ ■ - di be a word over the alphabet {0, 1} of length \w\ = I. When 
\w\ = 2™ and the indices of w are interpreted cyclically, the word is said to be a 
binary de Bruijn sequence of order n if it contains each of the 2™ distinct binary 
strings of length n as a subword. The string 00010111 is a binary de Bruijn 
sequences of order 3. 

A binary necklace is an equivalence class of binary words under rotation. 
The representative element for the equivalence class is chosen to be the lexi- 
cographically least one. A binary string is a Lyndon word if it is an aperiodic 
necklace representative. The binary Lyndon words of length < 4 are 0, 1, 01, 
001, 011, 0001, 0011, 0111. 

De Bruijn sequences and Lyndon words are related via the "Ford sequence," 
denoted here T n , which is the lexicographically least binary de Bruijn sequence 
of order n. Fredricksen proved [2] that T n is obtained by concatenating all 
Lyndon words of lengths dividing n in lexicographic order. For instance, J-4 = 
0000100110101111. We note that this result generalizes to constructing the 
lexicographically-least de Bruijn sequence over an arbitrary alphabet [3H]. 

The Ford sequence is also generated by applying a greedy strategy to the 
production of a binary de Bruijn sequence. The algorithm constructs T n one bit 
at a time, preferring 0's to l's whenever possible. Given this, it is reasonable to 
expect that initial segments of T n contain many more zeros than ones. In fact, 
previous work pQ shows that the maximum difference (called the discrepancy) 
of T n is 9(2™ log n/n). 




The discrepancy is the maximum possible "skew" over all prefixes of T n . The 
skew of a binary string w of length I, denoted sk(w), is the difference between 
the number of zeros and the number of ones. Since the length of w is the sum 
of these two numbers, we have that 

I i 
skH=£(-l) Q * and M = £(1)*. 

»=i »=i 

Figure [l] illustrates the discrepancy for n = 4,5,6,7 by graphing the skew of 
all prefixes of T n . As illustrated on the graphs, there arc natural breakpoints 
in T n following the occurrence of the subword CPl™ -1 for 1 < i < n — 1. The 
cases where i = and i — n are the final 1 and initial 0, respectively. 

In this article, we prove that the skew at these breakpoints gives sequences 
of values which are Fibonacci-like. Our results are given in terms of suffixes 
of the Ford sequence, which are directly related to prefixes by sk(J r n ) = and 
\J- n \ = 2". As stated precisely in Theorem [l] below, we show that the skew 
and the length of these breakpoint suffixes of JF n are enumerated by sequences 
generalizing the Fibonacci and Lucas numbers, respectively. 

2 Preliminaries and statement of main result 

Let ^o = l an d, for 1 < i < n — 1, let li be the subword of T n which begins 
immediately after the Lyndon word o l+1 l n_l_1 and ends with the string s l n_l . 
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Table 1: Values of sk(/C m ) when < m < 9 for each T n with 1 < n < 10. 
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Table 2: Values of \lC m \ when < m < 9 for each T n with 1 < n < 10. 



Hence, £j consists of all Lyndon words of length d \ n, in lexicographic order, 
which contain the substring l but not l+1 . For technical reasons, if i > n — 1, 
then we define £i = e, the empty string. 

Let C m be the concatenation of £ m ( m -i ■■■t-\ for a given T n . Hence, C m 
is the substring of the n-th Ford sequence which contains the Lyndon words of 
length d > 1 with at least one and at most m consecutive 0's. Let K m be the 
proper suffix of F n consisting of the Lyndon words of length d \ n containing at 
most m consecutive 0's; 

If m > n — 1, then JC m is T n except for the initial 0. Also, K. = Iq = 1. 

The Fibonacci numbers are defined by the recurrence F n — F„_i + F„_ 2 
with initial conditions F — and F\ — 1. The Lucas numbers are defined by 
the recurrence L n — L n -i + L n - 2 with initial conditions L = 2 and Li = 1. 
The (ordinary) generating functions for these sequences are x/(l — x — x 2 ) and 
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(2— x)/(l— x— x 2 ), respectively. For a detailed treatment of generating functions 
for recurrence relations, we refer the reader to [S]- 
Let 

m— 1 

d m {x) — l~x — x 2 — x m = 1 — x x % . 

i=0 

Let G™ be the integer sequence defined by the m-th order recurrence G™ 1 = 
i G n-i with initial conditions G™ = G™ = . . . = G™_ x = 1. This is a 
generalization of the Fibonacci numbers, and when m = 2 we recover G 2 n = 
F n+ \. It is straightforward to see that the sequence G™ has the generating 
function 

i-E^'t'-i)^ 

d m (x) 

There are many possible generalizations of the Fibonacci numbers, depending 
on how the initial conditions Fo = and F\ = 1 (and, in this case, F2 — 1) are 
extended. 

There arc likewise different generalizations of the Lucas numbers. Let H™ 
be the sequence defined by the m-th order recurrence H™ = ^™ , H^ l _ i with 
initial conditions H™ = m and H™ = 2 l — 1 for 1 < i < m — 1. So H 2 = L n , 
and it is straightforward to see that the generating function for the sequence 
H? is 

i=l \ m — l ) x 

d m (x) 

In this article, we prove the following. 
Theorem 1. For T n with n > and m > 0, 

sk(/C m ) - -G^+i 1 |/C m | = H™+\ 

3 Fibonacci, Lucas, De Bruijn, and Lyndon 

We first prove Theorem [T] for the special case when m = 1. That the result 
holds when m = follows directly from the definitions. 

We define a Lyndon word w to be a primitive of order i if w = O'P with 
i+j= \w\, i,j > 1. If a Lyndon word is not primitive, we say it is composite. 

Let w be a Lyndon word of length d | n which occurs in i\ . Then w may be 
uniquely parsed into primitives of order 1 as 

w = OFW' 2 . . . 01^, where ji > 1 and Y^Lii 1 + 3l) = d - 

Let (j) be a mapping from primitives of order 1 into the integers where 4>(01 J ) = 
1 + j and let $(n) be the multiset obtained by applying <j> to the 0V subwords 
of l\ from F n . For instance, in we have l\ = 01010111011011111 and 
$(6) = {2, 2, 3, 4, 6}. 

Let c(n, k) be the number of integers k > 2 in the multiset Since each 

€1 from T n with n > 1 contains exactly one primitive of order 1 and length n, 
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we have that c(n, k) = 1 when n = k. Also, c(n, fc) = for n < k. As we show 
below, the distribution for other (n, fc) is Fibonacci-like. 

Recall that a composition of an integer n into k (positive) parts is an ordered 
sum of integers 

n = X\ + x 2 + . . • + Xk where Xi > 1. 

We denote such a composition of n as an ordered fc-tuple x = (x\, ■ ■ ■ , Xk)- 
Since primitives of order 1 have length k > 2, in the proof below we consider 
compositions of n — k with parts greater than 1. There are F n _k-i distinct 
x = (x\, . . . ,Xj), with Xi = n — k and Xj > 2, a fact easily obtained by 

induction onn-fc. We show that there are an equal number of distinct 01 fe_1 
primitives in the substring l\ of T n . 

Theorem 2. 

c(n, k) = F n _k-i f or 2 < k < n — 1. 

Proof. Let fc > 2 be fixed. For n = fc + 1, there cannot be a primitive of order 
1 and length k in ^ of J-fc+i, so c(fc + 1, k) = 0. 

Let n > k + 2. Consider the compositions of n — k with parts greater than 
1. For each such x, let oj(x) be the binary string 

Q1 k-i Q1Xl -i ...oi^ _1 . 

Suppose that is aperiodic, and let X(x) be the Lyndon word which is 
the representative element for the equivalence class under rotation of uj(x). If k 
does not occur in x, then X(x) contributes exactly one integer k to the multiset 
$(n). 

Otherwise, A(x) contributes m + 1 times to the count of c(n, fc), where there 
are m > 1 parts of a; which equal fc. In this case, there are m additional 
compositions a;- 7 ' 1 , . . . , a; 3 " 1 of n — fc such that ^(a;^ 1 ), . . . , uj(x^ m ) all belong to the 
equivalence class of to(x) under rotation. Hence, there are m + 1 compositions 
of n — k which are associated with the same Lyndon word of length n from £\ . 

Suppose now that ui(x) is periodic with period p. Let \(x) be the Lyndon 
word of length p such that (A(x))™/ p is an element of the rotational equivalence 
class of lo(x). Then there are q(n/p) — 1 parts of x which equal k for some q > 1. 
Hence, X(x) contributes q integers k to the multiset <&(n). Observe that if q = 1, 
then a; is the only composition of n — k associated with A(a;). Otherwise, there 
are q — 1 other instances of 01 fe_1 in A(x) and q — 1 distinct compositions of 
n — k which are associated with A (a;). □ 

Recall that the skew of a binary string w is the difference between the number 
of 0's in w and the number of l's, denoted here e(w) and S(w) respectively. 
Hence, 

sk(w) — e(w) — S(w). 
Theorem 3. For T n with n>2, e{t\) = F n -i and 5(t\) = F n+ i - 1. 



5 



Proof. The result follows from Theorem [2j and the identities 

n 

F'l = F n+ 2 — 1 

i=l 

and 

n 

5^*- F„_ 4 = F n+3 - (n + 2). 

Each primitive of order 1 and length k contributes a zero to e(l\) and k — 1 ones 
to S(£i). Hence, 

n 

e(h) = ]Tc(n,fc) 

k=2 

n-1 

k=2 
= Fn-1- 

Likewise, we have that 

n 

5(h) = ^(fc-l)-c(n,fc) 

k=2 

n— 1 n— 1 

= (n - 1) - F n-k-l + X) k • F n-fe-l 

n-1 

= (n-1) - (F„_x - 1) - F„_ 2 + ^ fc • F n _i_ fc 
= n - F n + F n+2 - (n + 1) 

□ 

Because the Lucas and Fibonacci numbers are related as L n = F n —\ + F n+ \ 
for n > 1, we have the following result. 

Corollary 4. For T n with n>2, skf^) = — F n + 1 and \£i\ = L n — 1. 

Since /Ci = £il for T n with n > 1, we know that sk(/Ci) = —G 2 n _ x and 
|/Ci| = i/ r 2 j. Hence, Theorem [l] holds for m = 0, 1. 

4 Generalizing to higher orders 

We generalize compositions of an integer n into parts greater than 1 to ac- 
commodate Lyndon word primitives l V of order i > 1. We use the notation 
x (y) = (^(y- 1 ))' where ar(°) = a;, so = x' , £ (2) = x" , etc. We say that 

n = x[ yi) + x ( 2 yk) + . . . + x { k Vk) 
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is an m-colored composition of n into k parts greater than 1 if 

• Yh=i Xi=n with Xi > 2, 

• and < yi < min{xi — 2, m — 1} for 1 < i < k. 

Note that xf"^ — x^'^ if and only if Xi = Xj and y% = yj. For instance, 5, 5', 
2 + 3, 3 + 2, 3' + 2, and 2 + 3' arc the six 2-colored compositions of 5. 

For m > 2, let P™ be the sequence defined by the m-th order recurrence 
Pn = E™ i P™-i with initial conditions P m = . . . = P™_ 3 - 0, P™_ 2 = 1, and 

P%-i = o. 

The sequence P™ is another generalization of the Fibonacci numbers. For 
m = 2, we have that P 2 = P„_i when n > 1. In general, an inductive argument 
from the definition yields the following identities. 

Lemma 5. Let m > 2. For m < n < 2m - 2, P™ = 2 n ~ m . Also, P 2 ™ _ x = 

2 m-l _ i 

Let d(m, n) be the number of m-colored compositions of n into parts greater 
than 1 for m,n> 1. 

Theorem 6. 

d(m,n) = P^_ v 

Proof. The m-colored compositions of n satisfy an (m + l)-th order recursion 
as follows. Let 

T _ c^toi) „.(ite) T (vk)\ 

x — yj^i , x 2 , • • • , x fc ; 

for Ei=i Xi — n with integers Xj > 2 and < yi < min{xfc — 2, m — 1}. 
Suppose n > m + 2. If < x^ — 2, then 

is an m-colored composition of n — 1. Otherwise, 2 < X& < m + 1 and 

^(fl) ~(l/2) 

is an m-colored composition of n — x^. For n = m + 2, the recurrence has only 
m terms since there is no m-colored composition of 1 with parts > 2. 

For initial conditions, we consider m-colored compositions of integers n with 
1 < n < m + 1. We have that d(m, 1) = = P™ +1 and d(m, 2) = 1 = P™^ 1 for 
all m. We claim that d(m, n) = 2 n ~ 2 for 2 < n < m + 1. 

When 2 < n < m, the only symbols that can occur in the m-colored com- 
position of n are {2, 3, 3', 4, 4', 4", 5, . . . , n^ 2 )} and c?(m, n) = d(m — 1, n). 

Consider d(m, m + 1). There is exactly one m-colored composition of m + 1 
which is not an (m— l)-colored composition of m+1, namely x = ((m+l)(" l_1 )). 
Hence d(m, m + 1) = 1 + d(m — 1, m + 1). Inductively, then, 

m m 

d(m, m + 1) = 1 + d(m - = 1 + + ^ 2 4 ~ 2 = 2™- 1 . 

i=l i=2 

□ 



7 



It is again straightforward to see that the sequence P™ has the generating 
function 

, s x m - 2 (l-x) 

P m (X) =" 



where 



d m (x) 

d m (x) = 1 - x - x 2 - ... - x m = 1 - x 



m—l 

X" 
i=0 

as in the generating functions for G™ and H™ from Section [I] 

Let Z = {2, 3, 3', 4, 4', 4", . ■ •} be the set of colored integers. Let ip be a 

mapping from binary strings 0*F for i, j > 1 to Z where ^(0 l l J ) = (i + 

Recall that C rn is the concatenation of l m l m -\ ■ ■ -£i from T n , where li = e 

for i > n.— l. Let ^(m,n) be the m-colored multiset obtained by applying tp 

to the primitives of order 1 < i < m from C m . Let c(m,n, k) be the number of 

integers = k > 2 in ^(m, n). 

Theorem 7. For m > 1, fe > 2, f/ie count c(m,n, fe) zs i/ie coefficient of x n in 
x k ~ m+1 p m+ i(x) = Xk(1 ~ X) 



*m+l 



(:r) 



Proof. The argument is essentially the same as the proof of Theorem [2j except 
that we consider m-colored compositions x of n — k with parts greater than 1 
and give the result in terms of generating functions. The rotational symmetries 
of w(x), where the definition is extended to higher order primitives, depend on 
which parts of x are fc(°) = fe. □ 

By exchanging the colors of k^ ' and some fe w with < i < min{fc — 2, m — l} 
occurring in the m-colored compositions of n — k, we see that the number of 
occurrences of few is also c(m, n, fe). 

Theorem 8. Consider T n and C m with n > and m > 1. TTiera e(C m ) is the 
coefficient of x n in 

dm+i(a:) 

and 5(£ m ) is £/ie coefficient of x n in 

(1 - x)d m+ i(x)' 

Proof. Let 1 < i < m be fixed. Each primitive of order i contributes i zeros 
to e(£ m ), and the number of colored integers fc^ -1 ' in ^(m : n) is the sum of 
c(m, n, fe) for i + 1 < k < n. According to Theorem[7j this is the sum of the first 
n — (i + 1) + 1 terms of the sequence ao, ax, ci2, . . . whose generating function is 

(1 - x) 
d m+ i(x) ' 
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which is the coefficient of x n 1 1 in the series 

1 

d m+ i(x) ' 

The result for e(C m ) follows by a weighted summation over all 1 < i < m, with 
the exponents adjusted appropriately. 

Each primitive of order i and length k contributes (k — i) ones to 5(C m ). 
To calculate the contribution for a given i, we again sum over the first n — i 
terms of the sequence ao, a±, a,2, ■ ■ ■, except that now each term a,j is weighted 
by (n — i — j). This is the coefficient of x a ~ % ~ 1 in the series 

1 



(1 - x)d m+ x(x) 

Summing over the possible i's yields 5(C m ). □ 

Recall that K, m is the proper suffix of T n consisting of the Lyndon words of 
length d \ n containing at most m consecutive O's; 

Theorem 9. Consider !F n and JC m with n > 1 and m > 0. Then sk(/C m ) is the 
coefficient of x n in 

-^+ES 1 (*-2y 



d m+ i(x) 

and |/C m | is the coefficient of x n in 

YZX 1 ^ 



d m+ i(x) 

Proof. We have sk(/C m ) = e(IC m ) — 5(IC m ) where e(IC m ) = e(C m ) and 5(K, m ) = 
5{C m ) + 1. Adding to the generating function for 5(C m ) yields 

x(l -x m+1 ) 
(1 - x)d m+ i(x)' 

Taking the difference with e(X m ) gives 

-x + J2 "t 1 xi - ( m - i)^ m+2 

(1 - x)d m+1 {x) 

which simplifies to the desired result. Similarly, adding the two series yields 

x + YT=2 x l ~(m + l)x m + 2 
(1 - x)d m+ i(x) 

which again simplifies. □ 

Offsetting the sequence G™ by an initial zero, and recalculating the gener- 
ating function with the initial H™ = m replaced by a zero, we have the result 
stated in Theorem [T] 
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